Statistical and Machine Learning

Lab5: Support Vector Machine (SVM) and Support Vector Regression (SVR)

Tsai, Dai-Rong

Dataset

Glass Identification Dataset

# Set random seed
set.seed(1234)

# Packages
library(e1071) # for svm, tune

# Data
data(Glass, package = "mlbench")
  • Response
    • Type: Type of glass (1, 2, 3, 5, 6, 7)
  • Predictors (Unit: weight percent in corresponding oxide)
    • RI: Refractive Index
    • Na: Sodium
    • Mg: Magnesium
    • Al: Aluminum
    • Si: Silicon
    • K: Potassium
    • Ca: Calcium
    • Ba: Barium
    • Fe: Iron
dim(Glass)
[1] 214  10
head(Glass)
      RI    Na   Mg   Al    Si    K   Ca Ba   Fe Type
1 1.5210 13.64 4.49 1.10 71.78 0.06 8.75  0 0.00    1
2 1.5176 13.89 3.60 1.36 72.73 0.48 7.83  0 0.00    1
3 1.5162 13.53 3.55 1.54 72.99 0.39 7.78  0 0.00    1
4 1.5177 13.21 3.69 1.29 72.61 0.57 8.22  0 0.00    1
5 1.5174 13.27 3.62 1.24 73.08 0.55 8.07  0 0.00    1
6 1.5160 12.79 3.61 1.62 72.97 0.64 8.07  0 0.26    1
proportions(table(Glass$Type)) * 100

      1       2       3       5       6       7 
32.7103 35.5140  7.9439  6.0748  4.2056 13.5514 
str(Glass)
'data.frame':   214 obs. of  10 variables:
 $ RI  : num  1.52 1.52 1.52 1.52 1.52 ...
 $ Na  : num  13.6 13.9 13.5 13.2 13.3 ...
 $ Mg  : num  4.49 3.6 3.55 3.69 3.62 3.61 3.6 3.61 3.58 3.6 ...
 $ Al  : num  1.1 1.36 1.54 1.29 1.24 1.62 1.14 1.05 1.37 1.36 ...
 $ Si  : num  71.8 72.7 73 72.6 73.1 ...
 $ K   : num  0.06 0.48 0.39 0.57 0.55 0.64 0.58 0.57 0.56 0.57 ...
 $ Ca  : num  8.75 7.83 7.78 8.22 8.07 8.07 8.17 8.24 8.3 8.4 ...
 $ Ba  : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Fe  : num  0 0 0 0 0 0.26 0 0 0 0.11 ...
 $ Type: Factor w/ 6 levels "1","2","3","5",..: 1 1 1 1 1 1 1 1 1 1 ...

Create Training/Testing Partitions

  • Split data into 80% training set and 20% test set
nr <- nrow(Glass)
train.id <- sample(nr, nr * 0.8)

training <- Glass[train.id, ]
testing <- Glass[-train.id, ]
  • Check dimension
dim(training)
[1] 171  10
dim(testing)
[1] 43 10

Support Vector Machine (SVM)

svm() in R e1071 package is based on C/C++ code of LIBSVM developed by Chih-Chung Chang and Chih-Jen Lin.

svm.mod <- svm(Type ~ ., data = training, probability = TRUE)

Arguments

  • type
    • "C-classification": default when y is a factor.
    • "eps-regression": default when y is numeric.
    • "nu-classification": 1 allows for more control over the number of support vectors by specifying an additional parameter \(\nu\) which approximates the fraction of support vectors.
    • "nu-regression" 1
    • "one-classification":2 one-class classification for outlier/novelty detection.
  • cost: (default: 1) cost of constraints violation. It’s the \(C\)-constant of the regularization term in the Lagrange formulation.
  • epsilon: (default: 0.1) \(\epsilon\) in the \(\epsilon\)-insensitive loss function for "eps-regression" mode.
  • kernel
kernel formula parameters
"linear" \(\pmb{u}^T \pmb{v}\)
"polynomial" \((\gamma \pmb{u}^T \pmb{v} + c_0)^d\) \(\gamma\)(gamma), \(d\)(degree), \(c_0\)(coef0)
"radial" (default) \(exp\{ -\gamma |\pmb{u} - \pmb{v}|^2 \}\) \(\gamma\)(gamma)
"sigmoid" \(tanh\{ \gamma \pmb{u}^T \pmb{v} + c_0 \}\) \(\gamma\)(gamma), \(c_0\)(coef0)
  • probability: (default: FALSE) logical indicating whether the model should allow for probability predictions.
  • scale: (default: TRUE) A logical vector indicating the variables to be scaled. The center and scale values are returned and used for later predictions.
summary(svm.mod)

Call:
svm(formula = Type ~ ., data = training, probability = TRUE)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  1 

Number of Support Vectors:  146

 ( 44 56 14 11 8 13 )


Number of Classes:  6 

Levels: 
 1 2 3 5 6 7
names(svm.mod)
 [1] "call"            "type"            "kernel"          "cost"            "degree"         
 [6] "gamma"           "coef0"           "nu"              "epsilon"         "sparse"         
[11] "scaled"          "x.scale"         "y.scale"         "nclasses"        "levels"         
[16] "tot.nSV"         "nSV"             "labels"          "SV"              "index"          
[21] "rho"             "compprob"        "probA"           "probB"           "sigma"          
[26] "coefs"           "na.action"       "xlevels"         "fitted"          "decision.values"
[31] "terms"          

Decision Values

For multiclass classification with k levels, k>2, libsvm uses the “one-against-one”-approach, in which \(C^k_2\) binary classifiers are trained; the appropriate class is found by a voting scheme.

svm.mod$decision.values
          1/2     1/3      1/5        1/6       1/7     2/3       2/5        2/6       2/7
28   0.292535 0.99966  1.10787  1.2246735  1.108958 1.03402  1.372209  1.2381310  1.099412
80  -1.594201 0.84064  0.65284  0.9928962  0.751334 1.12481  1.274242  1.2860758  1.052939
150  0.116581 0.80937  0.85689  1.2228743  0.975003 0.95740  1.225425  1.4694638  1.057107
101 -0.864723 1.06109  1.02250  1.0827956  1.144494 1.03995  1.079123  1.2842521  1.095934
111 -0.999911 0.76688 -0.56814  0.1755648 -0.304030 1.01252  1.000043  1.0271130  1.019271
137  0.271718 1.12526  1.50743  1.4422507  1.360063 1.09029  1.538447  1.3573783  1.164290
            3/5       3/6        3/7        5/6         5/7        6/7
28   0.99571536  1.093102  0.9769175 -0.0759031 -0.09349189 -0.0813850
80   0.77675073  0.895017  0.6493944  0.1944002 -0.15166300 -0.3065743
150  0.85353841  1.137762  0.9998229  0.3173727  0.12916959 -0.2936433
101  0.45464507  0.707496  0.6678935  0.4476908  0.27228587 -0.2818009
111 -0.62493705  0.064317 -0.3932627  0.6247875  0.12914815 -0.4230444
137  1.12630990  1.259437  1.1032270 -0.0741291 -0.14239007 -0.1555657
 [ reached getOption("max.print") -- omitted 165 rows ]


Cost vs. Number of support vectors

cost <- 1:1000
nSV <- sapply(cost, \(C) svm(Type ~ ., Glass, cost = C)$tot.nSV)
plot(cost, nSV, xlab = "Cost", ylab = "Number of support vectors", type = 'l')

Parameter Tuning

svm.tune <- tune(svm, Type ~ ., data = training,
                 kernel = "radial", probability = TRUE,
                 range = list(cost = 10^seq(0, 2, len = 10), 
                              gamma = seq(0.1, 1, len = 10)),
                 tunecontrol = tune.control(cross = 10, nrepeat = 5))

Arguments of tune.control()

  • random: if an integer value is specified, random parameter vectors are drawn from the parameter space.
  • sampling
    • "cross" (default): (Repeated) cross(10)-fold cross validation.
    • "fix": Validation set approach. A single split into training/validation set is used, the training set containing a fix(2/3) part of the supplied data.
    • "bootstrap": Bootstrap. nboot(10) training sets of size boot.size(0.9) are sampled with replacement from the supplied data.
  • error.fun: a function returning the error measure to be minimized.
    • misclassification error for categorical predictions
    • MSE for numeric predictions.
    • a custom function with two arguments: a vector of true values and a vector of predicted values.
  • See ?plot.tune
plot(svm.tune, transform.x = log10, xlab = "log(Cost)", 
     color.palette = hcl.colors)
plot(svm.tune, transform.x = log10, xlab = "log(Cost)",
     color.palette = \(n) hcl.colors(n, palette = "RdYlBu", rev = TRUE))

summary(svm.tune)

Parameter tuning of 'svm':

- sampling method: 10-fold cross validation 

- best parameters:
   cost gamma
 2.7826   0.4

- best performance: 0.30392 

- Detailed performance results:
       cost gamma   error dispersion
1    1.0000   0.1 0.36307   0.139114
2    1.6681   0.1 0.34542   0.140671
3    2.7826   0.1 0.33366   0.133792
4    4.6416   0.1 0.32778   0.142481
5    7.7426   0.1 0.32190   0.119023
6   12.9155   0.1 0.35098   0.117810
7   21.5443   0.1 0.35131   0.118664
8   35.9381   0.1 0.36275   0.095724
9   59.9484   0.1 0.34510   0.064834
10 100.0000   0.1 0.35098   0.073627
11   1.0000   0.2 0.33954   0.124187
12   1.6681   0.2 0.33366   0.133792
13   2.7826   0.2 0.33366   0.133792
14   4.6416   0.2 0.32778   0.122141
15   7.7426   0.2 0.35131   0.130984
16  12.9155   0.2 0.34542   0.094991
17  21.5443   0.2 0.36863   0.096856
18  35.9381   0.2 0.35098   0.100173
19  59.9484   0.2 0.34542   0.109995
20 100.0000   0.2 0.33954   0.092207
21   1.0000   0.3 0.32778   0.118952
22   1.6681   0.3 0.34510   0.125449
23   2.7826   0.3 0.33366   0.121756
24   4.6416   0.3 0.34510   0.131436
25   7.7426   0.3 0.34510   0.093900
 [ reached 'max' / getOption("max.print") -- omitted 75 rows ]
svm.mod.best <- svm.tune$best.model
svm.mod.best

Call:
best.tune(METHOD = svm, train.x = Type ~ ., data = training, ranges = list(cost = 10^seq(0, 
    2, len = 10), gamma = seq(0.1, 1, len = 10)), tunecontrol = tune.control(cross = 10, 
    nrepeat = 5), kernel = "radial", probability = TRUE)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  2.7826 

Number of Support Vectors:  144

Prediction

  • See ?predict.svm
predict(svm.mod.best, testing)
  1   5   7  13  15  18  24  27  31  33  34  39  44  45  50  52  54  55  59  64  69  74  75  76  82 
  1   1   1   2   1   1   1   2   1   2   1   1   1   2   1   2   1   1   1   1   1   2   2   2   2 
 89  95 105 107 118 124 125 139 148 158 162 167 171 181 189 192 194 210 
  2   2   2   2   2   2   7   2   2   1   2   5   2   2   2   7   7   7 
Levels: 1 2 3 5 6 7
predict(svm.mod.best, testing, decision.values = TRUE)
  1   5   7  13  15  18  24  27  31  33  34  39  44  45  50  52  54  55  59  64  69  74  75  76  82 
  1   1   1   2   1   1   1   2   1   2   1   1   1   2   1   2   1   1   1   1   1   2   2   2   2 
 89  95 105 107 118 124 125 139 148 158 162 167 171 181 189 192 194 210 
  2   2   2   2   2   2   7   2   2   1   2   5   2   2   2   7   7   7 
attr(,"decision.values")
         1/2      1/3       1/5     1/6      1/7       2/3       2/5     2/6       2/7         3/5
1    0.32656  1.23646  0.945776 1.27696  0.87085 -0.013456  0.570500 0.65539  0.036037  0.77562038
5    0.58500  1.45703  1.180355 1.34424  1.17131  1.165966  1.065380 1.03844  1.056457  0.93237156
7    0.95073  1.25930  1.125140 1.36385  1.12514  0.727791  1.012334 0.97141  0.987988  0.94144473
13  -0.32258  1.23629  1.137753 1.08767  1.14233  1.182432  0.964825 0.99474  0.959257  0.21194762
15   1.56178  1.12761  1.018092 1.03184  0.99859  0.840087  1.224943 1.05627  0.986696  0.83389999
18   1.05209  0.25985  0.799816 0.98317  0.82377 -0.775979  0.502618 0.58081  0.076317  0.87849483
          3/6       3/7       5/6       5/7       6/7
1    0.874399  0.411398  0.076956 -0.481653 -0.525187
5    1.115381  0.840473  0.058309 -0.224737 -0.259096
7    1.119593  0.850269  0.038978 -0.231438 -0.241715
13   0.407904  0.110483  0.294264 -0.127934 -0.428863
15   0.940160  0.811785  0.263323 -0.071382 -0.360033
18   0.953800  0.817293  0.166359 -0.306638 -0.451753
 [ reached getOption("max.print") -- omitted 37 rows ]
Levels: 1 2 3 5 6 7
predict(svm.mod.best, testing, probability = TRUE)
  1   5   7  13  15  18  24  27  31  33  34  39  44  45  50  52  54  55  59  64  69  74  75  76  82 
  1   1   1   2   1   1   1   2   1   2   1   1   1   2   1   2   1   1   1   1   1   2   2   2   2 
 89  95 105 107 118 124 125 139 148 158 162 167 171 181 189 192 194 210 
  2   2   2   2   2   2   7   2   2   1   2   5   5   2   2   7   7   7 
attr(,"probabilities")
           1        2        3         5         6         7
1   0.580271 0.209516 0.094288 0.0236161 0.0201883 0.0721209
5   0.698248 0.257851 0.027497 0.0043023 0.0052842 0.0068172
7   0.778522 0.160631 0.042924 0.0045560 0.0052953 0.0080714
13  0.356226 0.563184 0.037874 0.0140968 0.0121277 0.0164924
15  0.847275 0.073266 0.048311 0.0072876 0.0113956 0.0124654
18  0.468820 0.096476 0.348980 0.0217335 0.0213301 0.0426617
24  0.752528 0.181034 0.050745 0.0037593 0.0055343 0.0063993
27  0.313138 0.611588 0.059134 0.0047697 0.0058830 0.0054865
31  0.631403 0.311692 0.035062 0.0055539 0.0084980 0.0077909
33  0.142117 0.791557 0.036194 0.0100343 0.0096471 0.0104496
34  0.843757 0.091416 0.033047 0.0072846 0.0127807 0.0117149
39  0.746846 0.090331 0.102106 0.0148111 0.0166073 0.0292988
44  0.790148 0.070403 0.088814 0.0096693 0.0131961 0.0277695
45  0.137635 0.775696 0.036580 0.0174921 0.0141469 0.0184506
50  0.600873 0.216876 0.124864 0.0118812 0.0201650 0.0253406
52  0.328427 0.544334 0.085994 0.0104483 0.0130353 0.0177620
 [ reached getOption("max.print") -- omitted 27 rows ]
Levels: 1 2 3 5 6 7
  • Contingency table & Accuracy
pred.svm <- predict(svm.mod.best, testing)
table(true = testing$Type, pred = pred.svm)
    pred
true  1  2  3  5  6  7
   1 16  5  0  0  0  0
   2  0 11  0  0  0  1
   3  1  2  0  0  0  0
   5  0  1  0  1  0  0
   6  0  1  0  0  0  0
   7  0  1  0  0  0  3
mean(testing$Type == pred.svm)
[1] 0.72093

Support Vector Regression (SVR)

  • Simulation Data
x <- runif(100)
y <- log(x) + rnorm(length(x), sd = 0.25)
df <- data.frame(x = x, y = y)
newdat <- data.frame(x = seq(min(x), max(x), len = 1000))
  • Default Model
svr.mod <- svm(y ~ x, data = df) # type = "eps-regression" by default
summary(svr.mod)

Call:
svm(formula = y ~ x, data = df)


Parameters:
   SVM-Type:  eps-regression 
 SVM-Kernel:  radial 
       cost:  1 
      gamma:  1 
    epsilon:  0.1 


Number of Support Vectors:  65
  • Tuned Model
# Use MSE as error measure by default
svr.tune <- tune(svm, y ~ x, data = df,
                 kernel = "radial",
                 range = list(cost = 10^seq(0, 3, len = 10),
                              epsilon = seq(0.1, 2, len = 10),
                              gamma = seq(0.1, 2, len = 10)),
                 tunecontrol = tune.control(cross = 5, nrepeat = 5))

# summary(svr.tune)
svr.mod.best <- svr.tune$best.model
  • Prediction
pred.svr.1 <- predict(svr.mod, newdat)
pred.svr.2 <- predict(svr.mod.best, newdat)
codes for plot
plot(df, cex = 0.5)
lines(newdat$x, log(newdat$x), col = 1)
lines(newdat$x, pred.svr.1, col = 4)
lines(newdat$x, pred.svr.2, col = 2)
legend("bottomright", legend = c("Truth", "pred.svr.1", "pred.svr.2"), lty = 1, col = c(1, 4, 2))

Tips on practical use

Source: Support Vector Machines: The Interface to libsvm in package e1071 by David Meyer.